Speeding up Matlab
Outlined here are four approaches to achieving faster performance of Matlab code: 1) Optimized coding 2) Using embedded c functions 3) Using Matlab's parallel computing functionality 4) Utilizing the Graphical Processing Unit It should be stressed that the degree of performance enhancement achieved by any approach is highly variable and depends on many factors such as hardware specifications, array sizes, communication overheads, and properties of the specific computation performed. For instance, some Matlab functions (e.g. fft, conv, matrix multiplication) are already optimized so that converting them to c-code gives no advantage, while some functions make implicit use of parallel computing functionality so that doing so explicitly gives no advantage. Experimentation should be used to see which method gives the best results. This information was provided in a seminar given by Systematics on 13/4/2011. Time-efficient programming * Pre-allocating arrays Even though Matlab can handle dynamic sizing of arrays (i.e. adding entries after the variable has been declared), this is a very time-consuming process: each time the array size is increased, the entire array is copied to a new memory location with the increased size, the new entry is written, and the old copy is deleted. Declaring the variable with a fixed size in advance prevents this. * Converting loops to vector operations As a general rule, a computation will run faster as a vector operation than as a loop. Useful functions to look up: bsxfun, arrayfun,meshgrid. * Linear Indexing When running a loop over a M*N matrix, using for i = 1:M*N something with Data(i) end is faster than using for i=1:M for j=1:N something with Data(i,j) end end because in the first case the indexes need to be converted to the linear subscript anyway. I tried it with a simple loop and decreased the runtime by about 30%. * Using the single data type Using single instead of double variables, in addition to halving memory requirements, apparently speeds up computation as well - for matrix multiplication I got a x2-3 speed increase. * Sparse matrices Computations on data which include a large number of zero elements (such as diagonal matrices) can be performed much faster if the matrix is declared as a sparse matrix. Sparse matrices store only the value and location of non-zero elements. This can be done as follows: S = sparse(A); computation on S B = full(S); * Column-wise iteration Since matrices are arranged in physical memory colum-wise, loops should run faster if they access an array in that order (outer loop over columns, inner loop over rows). However I tried this with a simple loop and it didn't make any difference. Using embedded c functions The idea is to take a matlab function and automatically convert it into c-code and then into a MEX file (MEX file is simply c-code which can be called by matlab). This approach works best when the I/O overhead (i.e. the data passed to and from the converted function) is small. All it takes is this command: emlmex myfunc Where myfunc is a regular matlab function, and then a MEX function file is created which can be called like a regular function. From v.2011a, emlmex is replaced by codegen. I tested this for a function that creates a 1000x1000 random matrix and inverts it, and got a x18 (!) speed increase. There are some limitations as to the structure of the converted code. One such limitation is that the function's input variables need to be declared explicitly. This can be done as follows: function MyFunc ( InputVar ) assert(size(InputVar)<100); % this causes the compiler to pre-allocate a size of 100 to InputArg, preventing errors when this variable is accessed in the function ... end To facilitate the conversion process, type %#eml at the beginning of the code. This will cause the m-lint to comment on issues related to this conversion. Note: this method does not support functions using the following: cell arrays, visualizations, sparse arrays, nested functions, objects, recursion. Type mex -setup first to choose a compiler. Using parallel computing On the most basic level this implies simply using parfor instead of for (with some limitations on the functionality of the loop - no dependencies between iterations etc. - see the documentation of the Parallel Computing Toolbox). The speed increase factor can be a much as by the number of cores on your computer. The parfor statement needs to be preceded by the matlabpool command to setup the desired number of parallel processes. A more advanced approach uses the command spmd to run a code section on multiple cores. This approach provides functions which can be called to communicate between cores, thus allowing dependencies between different iterations (e.g. one iteration waits until another is finished, received output from it and continues). An additional option is setting up a computing cluster - this is done by installing a Matlab Distributed Computing Server. Utilizing the GPU You can easily store data and run functions on the grapics card. About ~120 matlab functions are supported (critically for wavelet computation, conv and conv2 are only supported from 2011a). This is how it works: A = gpuArray(MyData); % store data on the GPU B = fft(A); % B is also stored on the GPU C = gather(B); % C is now in the workspace. Generally you'll get larger speed increases when processing larger matrices. There is also a very simple way to run an entire function on the GPU. This can be done as follows: gpufunc = @myfunc; % create a function handle gdata = gpuArray(data); % move the data to the GPU results = arrayfun(gpufunc, gpudata); % run the function on the GPU A more advanced mehtod (which can achiever very large speed increases) is to implement the function explicitly using CUDA (the programming language for NVidia's GPUs). NOTE: The GPU functionality requires Matlab version 2010b or higher, the Parallel Computing toolbox, and an nvidia GPU with a computing capability of 1.3 or higher. Type gpuDevice to get your GPU specifications.